Project Purpose and Goals
The following tables and visualizations are intended to give a broad overview of the programs, resources, and services high schools provided to English learner-classified students and their families as reported by the questionnaire responses. The primary objective for this project was to practice beginning functional programming skills, and thus the majority of the effort was placed in creating the tables and visualizations rather than on ensuring sound or thorough analysis.
The project utilized both descriptive approaches to the data to summarize how services and programs differed across regions and locale types, and quantitative approaches through the use of simple regression models. The questionnaire was deployed to central, western, northeastern, and southeastern United States and thus the original coding schemes were used to guide analysis. In some cases, locale types were modified from the original categories (city, suburban, town, rural) to a dichotomous urban/rural category for ease of interpretation.
I imagined a research team as the audience for this project, meaning little interpretation is offered as it’s meant to be a springboard for discussion and further analysis.
Other things to note
In order to conduct analyses, I created a set of scores that would provide me with the necessary continuous variables to compare regional and locale-based differences for this dataset.
These include scores for:
* instructional services and programs;
* language supports offered;
* non-instructional services offered;
* high school and parent services offered (including translation and
interpretation services);
* and an overall score for resources offered.
In addition, I created a bilingual content score which created a composite score based on the type of services and programs that were offered to EL-identified students without penalizing monolingual approaches. In other words, rather than creating a score that indicated simply whether a school offered a given service/program or not, the bilingual score differs by prioritizing bilingual approaches for a full score of “1” and a half score for English-based services and programs.
All scores were pulled and created from the corresponding subgroups
in the questionnaire:
* demographics
* instructional language services
* additional language supports
* non-instructional language services in the most common language for
ELs
* services available for ELs and their parents/guardians
For more information, please seek out the source code
(data-clean.Rmd and/or analysis.Rmd) which
includes notes on variable manipulation.
The following table is intended to describe how different services designed for EL-identified students are distributed across regions of the US. The score shown in each column is a combined average score for each region within each given composite score.
I anticipated that the average number of services would vary by region, which seemed to be the case, though to a much lesser degree than expected.
Unfortunately, across all regions the average services and programs appeared to be much lower than the maximum possible score. This may be due to error in creating composite scores, as it’s very likely schools will choose certain programs over others rather than utilizing multiple types of programs and services. In a future iteration of this analysis, I would more carefully select criteria for composite scores as informed by practice.
Similar to the table above, the following table is meant to demonstrate how different services are distributed with locale as the focal variable.
The final table considers the relationship between both region and locale, with the idea in mind that more differences would appear at a more granular level and that the differences may tell us something about how urban and rural schools differ in their approaches to providing services and programs for English learner-identified students.
Indeed, we are able to see some differences in scores with much higher scores for overall resources provided in western, central, and northeastern cities. (As a side note, it’s surprising that this dataset did not include the southwestern United States as it houses the highest concentrations of English learners.)
In a future iteration of this project, I would love to look at the specific services and programs offered rather than the combined score, as this seems to obscure some meaningful data that I did not have enough time to look at.
The following charts are meant to guide us further in considering the
relationship between the average scores by region and locale.
To create this visualization, I converted the raw scores from the
composite measures displayed in the tables into percentages. For ease of
comparison, suburbs and urban areas were condensed into a single
urban value, and rural areas and towns were similarly
condensed into a single rural value.
The mean difference between urban and rural values was used to consider how the differences compared in an easier-to-read format.
An interesting thing to note is that non-instructional services and supports was drastically higher than any other composite measure. This composite score covered services in the most common non-English language in the school and included services such as providing information about academic and career and technical (CTE) programs, providing written translations for documents sent to parents/guardians, and providing interpreter services when necessary.
On the other hand, the language support score is especially low in rural schools nationally, and is similarly low in comparison to the other scores for urban areas, though less so than in its urban counterparts. This composite score measured availability of tutoring, summer school, credit recovery options, mentoring, distance education options, and any other additional language-focused services made available.
This comparison got me thinking about the tension between compliance and innovation, and particularly how best to encourage schools to build these types of services that are beneficial for students overall and especially those who are historically marginalized. From this visualization we can see there is a lot of room for growth in building language-focused services.
I was intrigued by the differences this plot brought to the surface, and dug further into this area by comparing the top differences across all groups. I considered pulling top 5 differences for each region and locale, but I was curious if one region would be more heavily represented than the others.
This did, in fact, happen to be the case.
Differences between urban and rural areas were more pronounced in the Northeast, making up 4 of the total top 5 differences. The final top score was in a central region. It may be interesting to dig further into this finding to explore why language-focused supports, instructional programs and models, high school student and parent services, and overall bilingual offerings differed so much between urban and rural areas in the Northeast.
It’s worth noting that the composite score for high school and student and parent services showed up in 2 of the 5 total top differences. This score was very similar to the non-instructional language services score that included information about academic and CTE programs, written translations, and interpreter services for students and their families. This score differed in that it focused on other non-English languages represented in the school that were not the predominant non-English language. This is very telling about the services that may be available for speakers of other languages as they made up the 2 largest differences overall. This may in part be due to representation, as urban areas often show higher diversity in non-English language composition, but I am hesitant to make that speculation as it obscures the potential diversity of non-English languages spoken in rural areas–something which happens all too frequently as rural areas are (often incorrectly) assumed to be homogeneous, monolingual, White spaces.
The following visualization is another way to conceptualize how regional scores for subareas differed by using centered–rather than raw–scores.
As indicated in the plot, all scores were set to a mean of 0 to allow for a comparative view. From the graph, we can see how schools in the western United States scored above average in all areas except for services for high school students and parents speaking non-English languages other than the predominant non-English language in the school. There may be some interesting insights to gather from disaggregated data about the west, though I’m inclined to wonder if this may be due to California, where a large proportion of the nation’s English learner students attend school and which may have more robust services and programs as a result.
On the other hand, schools in the central United States scored below the mean in all composite scores. In thinking about California in the dataset, I almost wonder if, assuming that it skewed the data, scores in the central, northeast, and southeast regions were pulled further below the mean than we might see otherwise. Either way, there is a lot of room to improve services and programs in these areas.
Finally, I’ll also point out that the northeast scored below the mean on all scores except services for parents and high schoolers speaking non-dominant non-English languages, something surprising given the mean difference plot which showed the same category had the highest difference between urban and rural areas. This spiked further questions about the nature of the relationship between urban and rural areas in the Northeast.
The final task I did was to run a simple regression model to see if instructional supports, language-specific supports, or overall resource supports differed as a function of region. The results of these models are reported in the following tables, with significant results marked by asterisks.
Notes about the table
Per common conventions:
* A single asterisk denotes a p-value meeting the .05 threshold.
* Double asterisks denote a p-value meeting the .01 threshold.
* A triple asterisk denotes a p-value that meets the .001
threshold.
* n.s. denotes a p-value that was not significant.
The first table includes the results of the instructional supports
model, using the centered composite instructional score as the outcome
variable with region as the only predictor. Instructional language
services included:
* bilingual programs offered that cover core content areas
* two-way immersion or dual language programs offered that cover core
content areas
* integrated English as a second-language (ESL) model offered
* pull-in/push-out ESL models offered
* English speaking educational paraprofessional(s) available
* bilingual educational paraprofessional(s) that speak the primary
non-English language spoken by the majority of English
learner-identified students available
* sheltered content classrooms used
* other instructional supports used
The second table includes the results of the additional language supports model, using the centered composite support score as the outcome variable with region as the only predictor.
Additional language-focused supports and services included:
* tutoring services
* summer programs
* credit recovery options
* mentoring
* distance education options
* other supports and services offered
The final table includes the results of the overall resources model which used the centered composite score for total resources offered as the outcome variable and region as the sole predictor.